11 research outputs found

    Straight to Shapes: Real-time Detection of Encoded Shapes

    Full text link
    Current object detection approaches predict bounding boxes, but these provide little instance-specific information beyond location, scale and aspect ratio. In this work, we propose to directly regress to objects' shapes in addition to their bounding boxes and categories. It is crucial to find an appropriate shape representation that is compact and decodable, and in which objects can be compared for higher-order concepts such as view similarity, pose variation and occlusion. To achieve this, we use a denoising convolutional auto-encoder to establish an embedding space, and place the decoder after a fast end-to-end network trained to regress directly to the encoded shape vectors. This yields what to the best of our knowledge is the first real-time shape prediction network, running at ~35 FPS on a high-end desktop. With higher-order shape reasoning well-integrated into the network pipeline, the network shows the useful practical quality of generalising to unseen categories similar to the ones in the training set, something that most existing approaches fail to handle.Comment: 16 pages including appendix; Published at CVPR 201

    EXtremely PRIvate supervised Learning

    Get PDF
    National audienceThis paper presents a new approach called ExPriL for learning from extremely private data. Iteratively, the learner supplies a candidate hypothesis and the data curator only releases the marginals of the error incurred by the hypothesis on the privately-held target data. Using the marginals as supervisory signal, the goal is to learn a hypothesis that fits this target data as best as possible. The privacy of the mechanism is provably enforced, assuming that the overall number of iterations is known in advance

    Use and examination of convolutional neural networks for scene understanding

    No full text
    This thesis concerns itself with the use and examination of convolutional neural networks in the context of visual scene understanding. Towards this, the first part of the thesis proposes novel extensions to vanilla CNNs. These extensions attempt to incorporate domain knowledge into the computational framework of CNNs in order to better adapt them to targeted visual tasks. We begin by integrating a class prototypical embedding space in a conventional classification network, whereby real world object samples are recognised by matching them to correct class visual prototypes in this space. This use of side information is not only able to improve the network classification performance on the categories seen during training but also boosts the recognition of similar categories that are unseen during training. Likewise, we propose a deep neural model for real time instance segmentation that makes use of an intermediate shape embedding space. This continuous and learned latent space allows unseen input object images to be matched to new and realistic shape masks at test time. In the follow-up work, we draw inspiration from the recent advances in network design and training for object detection and segmentation. The assimilation of these techniques in our instance segmentation system allows us to further improve its accuracy while still operating in real time. In yet another application of CNNs, to the task of human saliency estimation, we revisit the interpretation of the task as a competitive process: humans look at some regions of an image at the cost of not looking at others. We thus model the output saliency maps as spatial probability distributions, and propose the use of losses that are suitable for measuring distances between probability distributions to train a deep network for the task. This formulation yields significant gains in terms of the network predictive performance as measured on an array of saliency metrics. After augmenting and applying conventional CNNs to a variety of visual tasks, the later part of the thesis shifts its focus to an examination of these networks. We begin by investigating classification CNNs at a qualitative level by modelling network visual attention. In particular, we formulate attention modules that can be trained in tandem with the network weights to optimise the end goal of image classification. The resulting spatial attention scores, associated with the local features at predefined network layers, are able to identify the semantic parts of the input images. In other words, the attention maps are able to suppress the irrelevant and highlight the relevant regions of the input images in a way that lends greater transparency to the inference procedure of nets and also boosts the output classification accuracy. Further, the binarised maps serve as useful weakly-supervised foreground segmentation masks. We then perform a more principled analysis of the class decision functions learned by classification CNNs by contextualising an existing geometrical framework for network decision boundary analysis. Our research uncovers some very intriguing yet simplistic facets of the class score functions learned by these networks that explain their adversarial vulnerability. We identify the fact that specific input image space directions tend to be associated with fixed class identities. This means that simply increasing the magnitude of correlation between an input image and a single image space direction causes the nets to believe that more (or less) of the class is present. This allows us to provide a new perspective on the existence of universal adversarial perturbations. Further, the input image space directions which the networks use to achieve their classification performance are the same along which they are most vulnerable to attack; the vulnerability arises from the rather simplistic non-linear use of the directions. Thus, as it stands, the performance and vulnerability of these nets are closely entwined. Various other notable observations emerge from our findings and are discussed in a greater detail in the thesis. We conclude by highlighting some open questions in an effort to inform future work in the field.</p

    End-to-End Saliency Mapping via Probability Distribution Prediction

    Get PDF
    Most saliency estimation methods aim to explicitly model low-level conspicuity cues such as edges or blobs and may additionally incorporate top-down cues using face or text detection. Data-driven methods for training saliency mod- els using eye-fixation data are increasingly popular, par- ticularly with the introduction of large-scale datasets and deep architectures. However, current methods in this lat- ter paradigm use loss functions designed for classification or regression tasks whereas saliency estimation is evalu- ated on topographical maps. In this work, we introduce a new saliency map model which formulates a map as a generalized Bernoulli distribution. We then train a deep ar- chitecture to predict such maps using novel loss functions which pair the softmax activation function with measures designed to compute distances between probability distri- butions. We show in extensive experiments the effective- ness of such loss functions over standard ones on four pub- lic benchmark datasets, and demonstrate improved perfor- mance over state-of-the-art saliency methods

    EXtremely PRIvate supervised Learning

    Get PDF
    National audienceThis paper presents a new approach called ExPriL for learning from extremely private data. Iteratively, the learner supplies a candidate hypothesis and the data curator only releases the marginals of the error incurred by the hypothesis on the privately-held target data. Using the marginals as supervisory signal, the goal is to learn a hypothesis that fits this target data as best as possible. The privacy of the mechanism is provably enforced, assuming that the overall number of iterations is known in advance
    corecore